Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

160

Applications in Computer Vision

Given a conventional FC layer, we denote wi ∈Rmi and ai ∈RCi as its weights and

features in the i-th layer, where mi = Ci × Ci−1. Ci represents the number of output

channels of i-th layer. Then we have the following.

ai = ai−1 ⊗wi,

(6.40)

where ⊗denotes full-precision multiplication. As mentioned above, the BNN model aims

to binarize wi and ai into PR→B(wi) and PR→B(ai). For simpliﬁcation, in this chapter we

denote PR→B(wi) and PR→B(ai) as b^wⁱ∈Bmi and b^aⁱ∈BCi in this paper, respectively.

Then, we use the eﬃcient XNOR and Bit-count operations to replace full-precision opera-

tions. Following [199], the forward process of the BNN is

ai = b^aⁱ⁻¹⊙b^wⁱ,

(6.41)

where ⊙represents eﬃcient XNOR and Bit-count operations. Based on XNOR-Net [199],

we introduce a learnable channel-wise scale factor to modulate the amplitude of real-valued

convolution. Aligned with the Batch Normalization (BN) and activation layers, the process

is formulated as

b^aⁱ= sign(Φ(αi ◦b^aⁱ⁻¹⊙b^wⁱ)),

(6.42)

where we divide the data ﬂow in POEM into units for detailed discussions. In POEM, the

original output feature ai is ﬁrst scaled by a channel-wise scale factor (vector) αi ∈RCi

to modulate the amplitude of its full-precision counterparts. It then enters Φ(·), which

represents a composite function built by stacking several layers, e.g., the BN layer, the non-

linear activation layer, and the max-pooling layer. Then the output is binarized to obtain

the binary activations b^aⁱ∈BCi, through the sign function. sign(·) denotes the sign function

that returns +1 if the input is greater than zeros and −1 otherwise. Then, 1-bit activation

b^aⁱcan be used for eﬃcient XNOR and Bit-count of the (i+1)-th layer.

6.3.3

Supervision for POEM

To constrain Bi-FC to have binarized weights with amplitudes similar to their real-valued

counterparts, we introduce a new loss function in our supervision for POEM. We consider

that unbinarized weights should be reconstructed based on binarized weights, as revealed

in Eq. 6.38. We deﬁne the reconstruction loss according to Eq. 6.38 as

LR = ¹

2^∥^wⁱ⁻^αⁱ^◦^b^wⁱ^∥²

2^,

(6.43)

where LR is the reconstruction loss. Taking into account the impact of αi on the layer

output, we deﬁne the learning objective of our POEM as

arg min

{wi,αi,pi},∀i∈N

LS(wi, αi, pi) + λLR(wi, αi),

(6.44)

where pi denotes the other parameters of real-valued layers in the network, e.g., BN layer,

activation layer, and unbinarized fully-connected layer. N denotes the number of layers in

the network. LS is the cross entropy.

And λ is a hyperparameter. Unlike binarization methods (such as XNOR-Net [199] and

Bi-Real Net [159]) where only the reconstruction loss is considered in the weight calculation.

By ﬁne-tuning the value of λ, our proposed POEM can achieve much better performance

than XNOR-Net, which shows the eﬀectiveness of combined loss against only softmax loss.

Our discrete optimization method comprehensively calculates the Bi-FC layers considering

the reconstruction loss and the softmax loss in a uniﬁed framework.